Skip to content

Fix DOCX tables inside SDT content controls not being parsed#3658

Closed
aaarif796 wants to merge 1 commit into
docling-project:mainfrom
aaarif796:fix-docx-table-markdown
Closed

Fix DOCX tables inside SDT content controls not being parsed#3658
aaarif796 wants to merge 1 commit into
docling-project:mainfrom
aaarif796:fix-docx-table-markdown

Conversation

@aaarif796

Copy link
Copy Markdown

Summary

Fix parsing of DOCX tables contained within Structured Document Tags (SDT/content controls).

Root Cause

The SDT handler processed only paragraph elements (w:p) from w:sdtContent and ignored table elements (w:tbl).

As a result, tables embedded inside SDT content controls were flattened into plain text during DOCX conversion, causing Markdown export to lose table structure.

Fix

Handle table elements contained within w:sdtContent and route them through _handle_tables() while preserving existing paragraph handling.

Validation

  • Reproduced issue using the DOCX attached in issue Table is not extracted correctly in DOCX #3655.
  • Verified that tables inside SDT blocks are now parsed into TableItems.
  • Added a regression test covering tables embedded in SDT content controls.
  • Verified existing SDT-related tests continue to pass.

Before

Tables inside SDT blocks were flattened into text and exported as paragraphs.

After

Tables inside SDT blocks are correctly parsed and exported as Markdown tables.

Signed-off-by: Aarif Ansari <aaarif796@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @aaarif796, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

Waiting for

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:
This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@PeterStaar-IBM PeterStaar-IBM requested a review from ceberam June 19, 2026 07:55
@PeterStaar-IBM

Copy link
Copy Markdown
Member

@aaarif796 Can you tell me how your fix adds compared to #3657

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/backend/msword_backend.py 54.54% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@PeterStaar-IBM

Copy link
Copy Markdown
Member

superseded by #3657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants